Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; and (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, machine translation, and visual-language generation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
translated by 谷歌翻译
Generative models, particularly GANs, have been utilized for image editing. Although GAN-based methods perform well on generating reasonable contents aligned with the user's intentions, they struggle to strictly preserve the contents outside the editing region. To address this issue, we use diffusion models instead of GANs and propose a novel image-editing method, based on pixel-wise guidance. Specifically, we first train pixel-classifiers with few annotated data and then estimate the semantic segmentation map of a target image. Users then manipulate the map to instruct how the image is to be edited. The diffusion model generates an edited image via guidance by pixel-wise classifiers, such that the resultant image aligns with the manipulated map. As the guidance is conducted pixel-wise, the proposed method can create reasonable contents in the editing region while preserving the contents outside this region. The experimental results validate the advantages of the proposed method both quantitatively and qualitatively.
translated by 谷歌翻译
数据增强是使用深度学习来提高对象识别的识别精度的重要技术。从多个数据集中产生混合数据(例如混音)的方法可以获取未包含在培训数据中的新多样性,从而有助于改善准确性。但是,由于在整个训练过程中选择了选择用于混合的数据,因此在某些情况下未选择适当的类或数据。在这项研究中,我们提出了一种数据增强方法,该方法根据班级概率来计算类之间的距离,并可以从合适的类中选择数据以在培训过程中混合。根据每个班级的训练趋势,对混合数据进行动态调整,以促进培​​训。所提出的方法与常规方法结合使用,以生成混合数据。评估实验表明,提出的方法改善了对一般和长尾图像识别数据集的识别性能。
translated by 谷歌翻译
在对象检测中,数据量和成本是一种权衡,在特定领域中收集大量数据是劳动密集型的。因此,现有的大规模数据集用于预训练。但是,当目标域与源域显着不同时,常规传输学习和域的适应性不能弥合域间隙。我们提出了一种数据合成方法,可以解决大域间隙问题。在此方法中,目标图像的一部分被粘贴到源图像上,并通过利用对象边界框的信息来对齐粘贴区域的位置。此外,我们介绍对抗性学习,以区分原始区域或粘贴区域。所提出的方法在大量源图像和一些目标域图像上训练。在非常不同的域问题设置中,所提出的方法比常规方法获得更高的精度,其中RGB图像是源域,而热红外图像是目标域。同样,在模拟图像与真实图像的情况下,提出的方法达到了更高的精度。
translated by 谷歌翻译
计算机生成的全息图(CGHS)用于全息三维(3D)显示器和全息投影。使用阶段的CGHS的重建图像的质量降低,因为重建图像的幅度难以控制。迭代优化方法,例如Gerchberg-Saxton(GS)算法是提高图像质量的一个选项。它们以迭代方式优化CGHS以获得更高的图像质量。然而,这种迭代计算是耗时的,并且图像质量的改善通常是停滞的。最近,已经提出了基于深度学习的全息图计算。深神经网络直接从输入图像数据推断出CGHS。然而,它仅限于重建与全息图相同的图像。在这项研究中,我们使用深度学习来优化使用缩放衍射计算和随机相位的方法生成的阶段CGHS。通过将随机相移方法与缩放的衍射计算组合,可以处理大于全息图的缩放重建图像。与GS算法相比,所提出的方法优化高质量和速度。
translated by 谷歌翻译
尽管最近的基于学习的校准方法可以从单个图像预测外部和内在的相机参数,但这些方法的准确性在Fisheye图像中劣化。这种劣化是由实际投影和预期投影之间的不匹配引起的。为了解决这个问题,我们提出了一种通用相机模型,具有解决各种类型的失真。我们的通用摄像机模型用于通过相机投影的闭合形式计算基于学习的方法。同时恢复旋转和鱼眼失真,我们提出了一种使用相机模型的基于学习的校准方法。此外,我们提出了一种损失函数,可以减轻四种外在和内在相机参数的误差幅度的偏差。广泛的实验表明,我们所提出的方法在两种大型数据集和由现成的Fisheye相机捕获的图像上表现优于传统方法。此外,我们是第一位分析基于学习的方法的性能的研究人员,使用各种类型的搁板摄像机的投影。
translated by 谷歌翻译
为了在老年人的日常生活中实现连续的虚弱护理,我们向家里的老年人提出Ahobo,一位虚弱的护理机器人。通过AHOBO实施两种类型的支持系统,以支持身体健康和心理方面的老年人。对于身体健康的体力保健,我们专注于血压,并开发了一种用Ahobo血压测量的支持系统。对于心理脆弱的护理,我们将用Ahobo作为与机器人的娱乐活动实施着色的着色。根据日常生活中连续使用的假设,评估系统的可用性。对于血压测量的支持系统,我们对16名受试者的问卷进行了定性评估,包括系统血压测量的老年人。结果证实,该拟议的机器人不会影响血压读数,并且在基于主观评估的易用性方面是可接受的。为了使复兴的着色相互作用,在口头流畅性任务下对两名老年人进行了主观评估,并且已经证实了互动可以在日常生活中不断使用。拟议的机器人作为支持日常生活的AI的界面广泛使用将导致AI机器人支持从摇篮到坟墓的社会。
translated by 谷歌翻译
变压器架构已经带来了计算语言领域的根本变化,这已经由经常性神经网络主导多年。它的成功还意味着具有语言和愿景的跨模型任务的大幅度变化,许多研究人员已经解决了这个问题。在本文中,我们审查了该领域中的一些最关键的里程碑,以及变压器架构如何纳入Visuol语言跨模型任务的整体趋势。此外,我们讨论了当前的局限性,并推测了我们发现迫在眉睫的一些前景。
translated by 谷歌翻译
We propose a new regularization method based on virtual adversarial loss: a new measure of local smoothness of the conditional label distribution given input. Virtual adversarial loss is defined as the robustness of the conditional label distribution around each input data point against local perturbation. Unlike adversarial training, our method defines the adversarial direction without label information and is hence applicable to semi-supervised learning. Because the directions in which we smooth the model are only "virtually" adversarial, we call our method virtual adversarial training (VAT). The computational cost of VAT is relatively low. For neural networks, the approximated gradient of virtual adversarial loss can be computed with no more than two pairs of forward-and back-propagations. In our experiments, we applied VAT to supervised and semi-supervised learning tasks on multiple benchmark datasets. With a simple enhancement of the algorithm based on the entropy minimization principle, our VAT achieves state-of-the-art performance for semi-supervised learning tasks on SVHN and CIFAR-10.
translated by 谷歌翻译